Google Newspaper Search - Image Processing and Analysis Pipeline
نویسندگان
چکیده
The Google Newspaper Search program was launched on September 8, 2008[1]. In this paper, we outline the technology pieces underlying this large and complex project. We have created a production pipeline which takes newspaper microfilms as input and emits individual news articles as output. These articles are then indexed and added to the content base, so that they turn up in response to Google searches. Thus, in response to a Google query “Hitler death”, we are able to show newspaper articles from the very day it was reported.. Non-uniform illumination, presence of significant noise, tears and scratches in the microfilm image, all pose special challenges for this project. The significant variation of layouts across newspapers and time eras, the variations in font sizes occurring in a single page (which confuses the OCR engine) compound the difficulties. The project is still going on after the initial launch was made (with about 15 million news articles).
منابع مشابه
A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملComputation Optical Flow Using Pipeline Architecture
Accurate estimation of motion from time-varying imagery has been a popular problem in vision studies, This information can be used in segmentation, 3D motion and shape recovery, target tracking, and other problems in scene analysis and interpretation. We have presented a dynamic image model for estimating image motion from image sequences, and have shown how the solution can be obtained from a ...
متن کاملUsing Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine
Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...
متن کاملمرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشهبندی
With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...
متن کاملRuntime Assignment of Reconfigurable Hardware Components for Image Processing Pipelines
The combination of hardware acceleration and flexibility make FPGAs important to image processing applications. There is also a need for efficient, flexible hardware/software codesign environments that can balance the benefits and costs of using FPGAs. Image processing applications often consist of a pipeline of components where each component applies a different processing algorithm. Component...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009